High-level ETL for semantic data warehouses
نویسندگان
چکیده
The popularity of the Semantic Web (SW) encourages organizations to organize and publish semantic data using RDF model. This growth poses new requirements Business Intelligence technologies enable On-Line Analytical Processing (OLAP)-like analysis over data. incorporation into a Data Warehouse (DW) is not supported by traditional Extract-Transform-Load (ETL) tools because they do consider issues in integration process. In this paper, we propose layer-based process set high-level RDF-based ETL constructs required define, map, extract, process, transform, integrate, update, load (multidimensional) Different other tools, automate flows creating metadata at schema level. Therefore, it relieves developers from burden manual mapping operation We create prototype, named Construct (SETLCONSTRUCT), based on innovative proposed here. To evaluate SETLCONSTRUCT, multidimensional DW integrating Danish dataset an EU Subsidy compare with previous programmable framework SETLPROG terms productivity, development time, performance. evaluation shows that 1) SETLCONSTRUCT uses 92% fewer Number Typed Characters (NOTC) than SETLPROG, SETLAUTO (the extension for generating execution automatically) further reduces Used Concepts (NOUC) another 25%; 2) time almost cut half compared 27% SETLAUTO; 3) scalable has similar performance SETLPROG. also our approach qualitatively interviewing two experts.
منابع مشابه
MODETL: A Complete MODeling and ETL Method for Designing Data Warehouses from Semantic Databases
In last decades, Semantic DataBases (SDB) have emerged and the major DBMS editors provide semantic support in their products. This is mainly due to the spectacular development of ontologies in several important domains like Ecommerce, Engineering, Medicine, etc. Note that ontologies can be seen as a natural continuity of conceptual models. Contrary to traditional databases, where their instance...
متن کاملFormalizing ETL Jobs for Incremental Loading of Data Warehouses
Extract-transform-load (ETL) tools are primarily designed for data warehouse loading, i.e. to perform physical data integration. When the operational data sources happen to change, the data warehouse gets stale. To ensure data timeliness, the data warehouse is refreshed on a periodical basis. The naive approach of simply reloading the data warehouse is obviously inefficient. Typically, only a s...
متن کاملQuery Optimizer for the ETL Process in Data Warehouses
ETL (Extraction-Transformation-Loading) process is responsible for extracting data from several sources, cleansing, transforming, integrating and loading into a data warehouse. Extraction process accesses large amount of data by executing several complex queries in source databases. These queries are repetitive and executed at regular interval to refresh the data warehouse. Extraction of data f...
متن کاملIncremental ETL Pipeline Scheduling for Near Real-Time Data Warehouses
We present our work based on an incremental ETL pipeline for on-demand data warehouse maintenance. Pipeline parallelism is exploited to concurrently execute a chain of maintenance jobs, each of which takes a batch of delta tuples extracted from source-local transactions with commit timestamps preceding the arrival time of an incoming warehouse query and calculates Ąnal deltas to bring relevant ...
متن کاملBuilding data warehouses with semantic web data
The Semantic Web (SW) deployment is now a realization and the amount of semantic annotations is ever increasing thanks to several initiatives that promote a change in the current Web towards the Web of Data, where the semantics of data become explicit through data representation formats and standards such as RDF/(S) and OWL. However, such initiatives have not yet been accompanied by efficient i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Semantic web
سال: 2021
ISSN: ['2210-4968', '1570-0844']
DOI: https://doi.org/10.3233/sw-210429